Passenger data from the Titanic¶

The dataset contains information about the passengers of the RMS Titanic, which sank on April 15, 1912, after colliding with an iceberg. The data includes attributes such as travel class, age, gender, number of siblings/spouses aboard, number of parents/children aboard, ticket price, and embarkation point. The dataset also includes information on whether the passenger survived the disaster. The Titanic carried over 2,200 people, of which over 1,500 perished, making this disaster one of the most tragic in maritime history. Columns:

  • pclass - Ticket class
  • survived - Whether the passenger survived the disaster
  • name - Passenger's name
  • sex - Passenger's gender
  • age - Passenger's age
  • sibsp - Number of siblings/spouses aboard
  • parch - Number of parents/children aboard
  • ticket - Ticket number
  • fare - Ticket price
  • cabin - Cabin number
  • embarked - Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
  • boat - Lifeboat number
  • body - Body number (if the passenger did not survive and the body was recovered)
  • home.dest - Destination

Titanic Disaster in Numbers - Mateusz Nowakowski¶

titanic-submersible-oceangate-illustration-andrea-gatti.jpg

Explanatory Data Analysis¶

1. General Data Overview¶

The subject of the analysis is the Titanic disaster. There were a total of 2,212 people on the Titanic: 1,320 passengers and 892 crew members. We have almost a complete list of all passengers. It should be noted that the analysis concerns only passengers as we do not have data on the crew. We have a dataset consisting of 1,310 rows and 14 columns, compiled - as we can guess - after the disaster. Seven columns contain numbers and the other seven contain strings. At first glance, we can see that the dataset has quite a few gaps and the data will require processing.

Source: https://en.wikipedia.org/wiki/Sinking_of_the_Titanic

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1310 entries, 0 to 1309
Data columns (total 14 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   pclass     1309 non-null   float64
 1   survived   1309 non-null   float64
 2   name       1309 non-null   object 
 3   sex        1309 non-null   object 
 4   age        1046 non-null   float64
 5   sibsp      1309 non-null   float64
 6   parch      1309 non-null   float64
 7   ticket     1309 non-null   object 
 8   fare       1308 non-null   float64
 9   cabin      295 non-null    object 
 10  embarked   1307 non-null   object 
 11  boat       486 non-null    object 
 12  body       121 non-null    float64
 13  home.dest  745 non-null    object 
dtypes: float64(7), object(7)
memory usage: 143.4+ KB

cafe-parisien-640x398.jpg

2. Analysis of Missing Values¶

The dataset has many missing values. We will count and discuss them below.

pclass survived name sex age sibsp parch ticket fare cabin embarked boat body home.dest
1309 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
0
pclass survived name sex age sibsp parch ticket fare cabin embarked boat body home.dest
1308 3.0 0.0 Zimmerman, Mr. Leo male 29.0 0.0 0.0 315082 7.875 NaN S NaN NaN NaN

Below we create a separate dataframe that contains the sum of all missing values for each column.¶

Let's discuss each column:

  • columns 'pclass', 'survived', 'name', 'sex', 'sibsp', 'parch' and 'ticket' contain no missing values. This is very good news as these are key data for analysis
  • in columns fare and embarked we see only single missing values - we will replace fare column with median while embarked column can be removed as it won't be needed for further analysis.
  • columns 'cabin' and 'home destination' have many missing values but these are rather peripheral data so we will remove these two columns as they won't be needed for further analysis.
  • column 'body' i.e. body number contains the most missing values at 90%. The very fact that there is no body number tells us that the body was most likely not found. In this case, therefore, lack of number is valuable information. We will not interfere with this data.
  • in column 'boat' 823 records are missing. These are probably people who for various reasons did not make it onto a lifeboat. So we won't fill in missing data. The problem with this column, however, is that the same people have been assigned to more than one boat. For this reason, the column will require transformation.
  • I left the 'age' column for last and here is a bit of a problem because this is very important information and we are missing 263 records or 20%. I see two options here: 1. any analysis of correlation with age will be conducted on a reduced sample. 2. we replace with mean or median and then we have a full sample and any age correlation studies will be somewhat distorted but will apply to all passengers. Option 2 seems better.
Missing Values and Percentage
  Missing Values Percentage
pclass 0 0.000000
survived 0 0.000000
name 0 0.000000
sex 0 0.000000
age 263 20.091673
sibsp 0 0.000000
parch 0 0.000000
ticket 0 0.000000
fare 1 0.076394
cabin 1014 77.463713
embarked 2 0.152788
boat 823 62.872422
body 1188 90.756303
home.dest 564 43.086325

Titanic_wreck_bow.jpg

Data Transformation¶

Let me transform the data before we move on to analyzing individual variables. The data requires significant processing. I would like to change a few things before we start analysis and draw any conclusions.

Let's list all the changes:

  • we remove columns 'ticket', 'embarked', 'cabin' and 'home.dest' because they won't be needed for what I want to show you.
  • columns 'sibsp'(siblings/spouse) and 'parch'(parents/children) are combined into one column and transformed so that we only know if someone traveled with family (1.0) or alone (0.0) - that should be enough
  • we fix the data in the 'boat' column so that one passenger is assigned to only one boat.
  • missing values in the 'fare' column are replaced with median
  • missing values in the 'age' column are replaced with median
pclass survived name sex age sibsp parch fare boat body
0 1.0 1.0 Allen, Miss. Elisabeth Walton female 29.0 0.0 0.0 211.3375 2 NaN
name survived sex age pclass with family fare boat body
902 Johnston, Mr. Andrew G 0.0 male NaN 3.0 1.0 23.4500 NaN NaN
636 Arnold-Franchi, Mrs. Josef (Josefine Franchi) 0.0 female 18.0 3.0 1.0 17.8000 NaN NaN
463 Jefferys, Mr. Ernest Wilfred 0.0 male 22.0 2.0 1.0 31.5000 NaN NaN
346 Botsford, Mr. William Hull 0.0 male 26.0 2.0 0.0 13.0000 NaN NaN
143 Harder, Mr. George Achilles 1.0 male 25.0 1.0 1.0 55.4417 5 NaN
boat
13         39
C          38
15         37
14         33
4          31
10         29
5          27
3          26
9          25
11         25
16         23
8          23
7          23
D          20
6          20
12         19
2          13
A          11
B           9
1           5
5 7         2
C D         2
13 15       2
5 9         1
8 10        1
13 15 B     1
15 16       1
Name: count, dtype: int64
boat
13    42
C     40
15    38
14    33
4     31
5     30
10    29
3     26
9     25
11    25
8     24
16    23
7     23
D     20
6     20
12    19
2     13
A     11
B      9
1      5
Name: count, dtype: int64
name survived sex age pclass with family fare boat body
334 Banfield, Mr. Frederick James 0.0 male 28.0 2.0 0.0 10.5000 NaN NaN
1078 O'Dwyer, Miss. Ellen "Nellie" 1.0 female 28.0 3.0 0.0 7.8792 NaN NaN
825 Goodwin, Master. Harold Victor 0.0 male 9.0 3.0 1.0 46.9000 NaN NaN
1296 Wirz, Mr. Albert 0.0 male 27.0 3.0 0.0 8.6625 NaN 131.0
296 Thayer, Mrs. John Borland (Marian Longstreth M... 1.0 female 39.0 1.0 1.0 110.8833 4 NaN
name              0
survived          0
sex               0
age               0
pclass            0
with family       0
fare              0
boat            823
body           1188
dtype: int64

df010f0f-d7ed-49e8-81a7-febeeff7b7ad-9-titanic-accommodation.jpg

4. Single Variable Analysis¶

Now that we have properly processed our dataframe, we will answer basic questions about individual variables:

  1. 'survived' - How many people survived the disaster?
  2. 'sex' - How many women and men were there?
  3. 'age' - Who were the youngest and oldest passengers and what was the average age?
  4. 'pclass' - How many people traveled in each class?
  5. 'with family' - How many people traveled alone versus with family?
  6. 'fare' - What was the cheapest and most expensive ticket, and what was the average and median ticket price?
  7. 'boat' - How many lifeboats were there and how many passengers were assigned to each lifeboat?
  8. 'body' - How many passengers who did not survive were assigned a body number?
Number and Percentage of Survivors
  Survivors Percentage
survived    
0.000000 809 62.000000
1.000000 500 38.000000
Number and Percentage of Passengers by Gender
  Gender Count Percentage
sex    
male 843 64.000000
female 466 36.000000
Youngest, Oldest, Average and Median Age
  age
min 0.166700
mean 29.503183
median 28.000000
max 80.000000
Number and Percentage of Passengers in Each Class
  Class Count Percentage
pclass    
3.000000 709 54.000000
1.000000 323 25.000000
2.000000 277 21.000000
Number and Percentage of Passengers Traveling Alone vs. With Family
  Family Count Percentage
with family    
0.000000 790 60.000000
1.000000 519 40.000000
Cheapest, Most Expensive, Average and Median Ticket Price
  fare
min 3.170800
mean 33.718995
median 14.500000
max 512.329200
Number of Passengers in Each Lifeboat
  People In the boat
boat  
13 42
C 40
15 38
14 33
4 31
5 30
10 29
3 26
9 25
11 25
8 24
16 23
7 23
D 20
6 20
12 19
2 13
A 11
B 9
1 5
People In the boat    486
dtype: int64
How many passengers who did not survive the disaster were assigned a body number?
  Total Not Survived Not Survived With Body Number Percentage
0 809 121 14.956737
No description has been provided for this image

Summary of the analysis of individual variables¶

After an initial analysis of the variables, we learn that:

  1. 500 passengers survived the disaster, which represents only 38% of all passengers.
  2. Among the passengers, there were 843 men and 466 women, accounting for 64% and 36%, respectively.
  3. The youngest passenger was only two months old, the oldest was 80 years old, the average age was nearly 30 years, and the median age was 28 years.
  4. 323 (25%) passengers traveled in 1st class, 272 (21%) in 2nd class, and as many as 709 (54%) in 3rd class.
  5. 790 (60%) passengers traveled without family members, while 519 (40%) traveled with family.
  6. The cheapest ticket among the passengers was free*, the cheapest non-free ticket cost 3.14 GBP, the average ticket price was 33.28 GBP, the median was 14.45 GBP, and the most expensive ticket was 512.32 GBP.
  7. There were 20 lifeboats on the Titanic, which collectively accommodated 486 survivors.
  8. Only 121 out of 809 passengers who did not survive the disaster were assigned a body number, indicating that only 14% of the bodies of all passengers who perished were recovered.

*This is neither an error nor a missing value - there were passengers on the Titanic with free tickets (funded by the carrier). We will discuss free tickets on the Titanic during the analysis of the outliers.

Olympic-Grand-Staircase.png

5. Analysis of Relationships Between Variables.¶

At this stage, we will focus on examining the relationships between variables. Initially, we will investigate the survival chances of passengers. We will try to determine which variables played the most significant role in the fight for survival. Do survival chances depend on age, gender, wealth, or whether someone traveled alone or with family? We will also discuss other interesting topics. Below is a list of questions I intend to answer:

  1. How many men and women were there in relation to how many men and women survived the disaster?
  2. How many men and women were there in relation to how many men and women survived the disaster, broken down by class?
  3. How many children (<18) were there in each class - how many children survived in each class?
  4. How many elderly people (60+) were there in each class - how many 60+ people survived in each class?
  5. Did passengers who paid more for their tickets have a higher chance of survival, considering the class breakdown?
  6. Did passengers traveling with family have a higher chance of survival than those traveling alone, considering the class breakdown, and is there a correlation for men who have the lowest survival rate?
  7. Distribution of people in lifeboats, considering the class breakdown. Were first-class passengers privileged - did they have their own lifeboats and more space?
1. How many men and women were there in relation to how many men and women survived the disaster?
  Total Survived Total_Percentage Survived_Percentage
sex        
female 466 339 35.599694 67.800000
male 843 161 64.400306 32.200000
2. How many men and women were there in each class - how many men and women survived the disaster?
Count Type Total Survived Survival Percentage
Sex female male female male female male
pclass            
1.000000 144 179 139 61 96.527778 34.078212
2.000000 106 171 94 25 88.679245 14.619883
3.000000 216 493 106 75 49.074074 15.212982
3. How many children were there in each class - how many children survived in each class?
  Total Children Survived Survived (%)
pclass      
1.000000 15 13 86.666667
2.000000 33 29 87.878788
3.000000 106 39 36.792453
4. How many elderly people (50+) were there in each class - how many elderly people (50+) survived in each class?
  Total Elders Survived Survived (%)
pclass      
1.000000 64 34 53.125000
2.000000 20 3 15.000000
3.000000 11 1 9.090909
5. Did passengers who paid more for their tickets have a higher chance of survival?
  Class Top 20% Survived Top 20% Total Top 20% Survival Rate (%) Bottom 20% Survived Bottom 20% Total Bottom 20% Survival Rate (%)
0 1.000000 45.000000 65 69.230769 31.000000 69 44.927536
1 2.000000 36.000000 60 60.000000 15.000000 56 26.785714
2 3.000000 38.000000 147 25.850340 50.000000 196 25.510204
6. Survival rates of passengers with and without family, broken down by class. Survival rates of men with family
  Class Survived_Alone Survived_With_Family Total_Alone Total_With_Family Survival_Rate_Alone Survival_Rate_With_Family Male_Survived_Alone Male_Survived_With_Family Male_Total_Alone Male_Total_With_Family Male_Survival_Rate_Alone Male_Survival_Rate_With_Family
0 1.000000 82.000000 118.000000 160.000000 163.000000 51.250000 72.392638 32.000000 29.000000 108.000000 71.000000 29.629630 40.845070
1 2.000000 48.000000 71.000000 158.000000 119.000000 30.379747 59.663866 12.000000 13.000000 116.000000 55.000000 10.344828 23.636364
2 3.000000 109.000000 72.000000 472.000000 237.000000 23.093220 30.379747 53.000000 22.000000 372.000000 121.000000 14.247312 18.181818
7. Distribution of people in lifeboats with a breakdown by class. Were first-class passengers privileged - did they have their own lifeboats and more space?
pclass 1.000000 2.000000 3.000000
boat      
1 5 0 0
10 8 15 6
11 6 14 5
12 0 17 2
13 1 12 29
14 5 23 5
15 1 1 36
16 0 3 20
2 7 0 6
3 26 0 0
4 24 7 0
5 30 0 0
6 19 0 1
7 22 1 0
8 24 0 0
9 6 16 3
A 3 0 8
B 3 1 5
C 2 0 38
D 9 2 9
No description has been provided for this image

luxury-lounge-on-board-the-rms-titanic-news-photo-1643817466.jpg

Summary of the Analysis of Relationships Between Variables¶

  1. There were 466 women and 843 men on board. 67% (339) of women survived the disaster, compared to only 32% (161) of men.
  2. In the first class, 96% of women and 34% of men survived. In the second class, 88% of women and only 14% of men survived. In the third class, only 49% of women and 15% of men survived.
  3. In the first class, 86% of children under 18 survived. In the second class, 87% survived, and in the third class, only 36% survived.
  4. In the first class, 53% of passengers aged 50+ survived. In the second class, 15% survived, and in the third class, 9% survived.
  5. Passengers in the 1st and 2nd class who paid more for their tickets had a better chance of survival. The ticket price in the 3rd class did not affect survival rates.
  6. Traveling with family had a positive impact on survival rates. This trend is visible among 1st and 2nd class passengers, likely because wealthier women, who had the highest survival rates, typically did not travel alone at that time. I specifically highlighted men to see if those traveling with and caring for their families had a better chance of survival than those traveling alone. According to my chart above, it seems they did not. However, ChatGPT, citing articles on the disaster, claims they definitely did.
  7. Officially, there was no class division during evacuation. Unofficially, first-class passengers were highly privileged compared to others. This was due to their cabins being closest to the lifeboats and the crew assisting first-class passengers first. First-class passengers were also the first to be informed by the crew about the severity of the situation. This information was crucial, as most passengers realized too late that the unsinkable Titanic would indeed sink. The chart above shows that boats 3-8 were almost exclusively filled with first-class passengers. Our DataFrame does not contain crew data, but we know from other sources that lifeboats were not full. This was likely to ensure more comfort for first-class passengers. The chaos and panic during evacuation and lack of proper crew training were contributing factors.

Source: ChatGPT, when asked for sources, cites the books: "A Night to Remember" by Walter Lord, "The Loss of the S.S. Titanic" by Lawrence Beesley, "Titanic: A Voyage of Discovery" by Tror Rowe. It's interesting to consider whether it actually has these books in its datasets or just claims to for appearance.

scan0005.jpeg

6. Analysis of Outliers¶

At this stage, we will examine values and information that significantly differ from the rest of the data. We will answer the following questions:

  1. How many small children under 2 (outliers) were there, and what was their survival rate?
  2. Who were the individuals with the most expensive tickets, and what was included in the ticket price that made it almost 100 times more expensive than the cheapest?
  3. How many people had free tickets, why, and who were they?
  4. Why were there only 5 passengers in lifeboat number one, and who were they?
  5. Were there individuals who made it to a lifeboat but did not survive?
  6. Were there individuals who did not make it to any lifeboat but somehow survived?
No description has been provided for this image
1. How many small children under 2 (outliers) were there, and what was their survival rate?
  Total Children Survived Survived (%)
pclass      
1.000000 1 1 100.000000
2.000000 7 7 100.000000
3.000000 14 9 64.285714
2. Who were the people who had the most expensive tickets, and what was included in the ticket price that made it almost 200 times more expensive than the cheapest one?
    Total Rich Survived Survived (%)
pclass sex      
1.000000 female 53 50 94.339623
male 31 10 32.258065
No description has been provided for this image
3. Who were the people who had the most expensive tickets, and what was included in the ticket price that made it almost 200 times more expensive than the cheapest one?
  name survived sex age pclass with family fare boat body
49 Cardeza, Mr. Thomas Drake Martinez 1.000000 male 36.000000 1.000000 1.000000 512.329200 3 nan
50 Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake) 1.000000 female 58.000000 1.000000 1.000000 512.329200 3 nan
183 Lesurer, Mr. Gustave J 1.000000 male 35.000000 1.000000 0.000000 512.329200 3 nan
302 Ward, Miss. Anna 1.000000 female 35.000000 1.000000 0.000000 512.329200 3 nan
3. How many people had a free ticket? Why? Who were they?
  name survived sex age pclass with family fare boat body
1 Andrews, Mr. Thomas Jr 0.000000 male 39.000000 1.000000 0.000000 0.000000 nan nan
2 Chisholm, Mr. Roderick Robert Crispin 0.000000 male 28.000000 1.000000 0.000000 0.000000 nan nan
3 Fry, Mr. Richard 0.000000 male 28.000000 1.000000 0.000000 0.000000 nan nan
4 Harrison, Mr. William 0.000000 male 40.000000 1.000000 0.000000 0.000000 nan 110.000000
5 Ismay, Mr. Joseph Bruce 1.000000 male 49.000000 1.000000 0.000000 0.000000 C nan
6 Parr, Mr. William Henry Marsh 0.000000 male 28.000000 1.000000 0.000000 0.000000 nan nan
7 Reuchlin, Jonkheer. John George 0.000000 male 38.000000 1.000000 0.000000 0.000000 nan nan
8 Campbell, Mr. William 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
9 Cunningham, Mr. Alfred Fleming 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
10 Frost, Mr. Anthony Wood "Archie" 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
11 Knight, Mr. Robert J 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
12 Parkes, Mr. Francis "Frank" 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
13 Watson, Mr. Ennis Hastings 0.000000 male 28.000000 2.000000 0.000000 0.000000 nan nan
14 Johnson, Mr. Alfred 0.000000 male 49.000000 3.000000 0.000000 0.000000 nan nan
15 Johnson, Mr. William Cahoone Jr 0.000000 male 19.000000 3.000000 0.000000 0.000000 nan nan
16 Leonard, Mr. Lionel 0.000000 male 36.000000 3.000000 0.000000 0.000000 nan nan
17 Tornquist, Mr. William Henry 1.000000 male 25.000000 3.000000 0.000000 0.000000 15 nan
4. Why were there only 5 passengers in lifeboat number one and who were they?
  name survived sex age pclass with family fare boat body
1 Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan") 1.000000 female 48.000000 1.000000 1.000000 39.600000 1 nan
2 Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan") 1.000000 male 49.000000 1.000000 1.000000 56.929200 1 nan
3 Francatelli, Miss. Laura Mabel 1.000000 female 30.000000 1.000000 0.000000 56.929200 1 nan
4 Salomon, Mr. Abraham L 1.000000 male 28.000000 1.000000 0.000000 26.000000 1 nan
5 Stengel, Mr. Charles Emil Henry 1.000000 male 54.000000 1.000000 1.000000 55.441700 1 nan
5. Were there people who got on a lifeboat but did not survive?
  name survived sex age pclass with family fare boat body
1 Beattie, Mr. Thomson 0.000000 male 36.000000 1.000000 0.000000 75.241700 A nan
2 Hoyt, Mr. William Fisher 0.000000 male 28.000000 1.000000 0.000000 30.695800 14 nan
3 Renouf, Mr. Peter Henry 0.000000 male 34.000000 2.000000 1.000000 21.000000 12 nan
4 Backstrom, Mr. Karl Alfred 0.000000 male 32.000000 3.000000 1.000000 15.850000 D nan
5 Harmer, Mr. Abraham (David Lishin) 0.000000 male 25.000000 3.000000 0.000000 7.250000 B nan
6 Keefe, Mr. Arthur 0.000000 male 28.000000 3.000000 0.000000 7.250000 A nan
7 Lindell, Mr. Edvard Bengtsson 0.000000 male 36.000000 3.000000 1.000000 15.550000 A nan
8 Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson) 0.000000 female 30.000000 3.000000 1.000000 15.550000 A nan
9 Yasbeck, Mr. Antoni 0.000000 male 27.000000 3.000000 1.000000 14.454200 C nan
6. Were there people who did not get on any lifeboat but somehow survived?
  name survived sex age pclass with family fare boat body
1 Lurette, Miss. Elise 1.000000 female 58.000000 1.000000 0.000000 146.520800 nan nan
2 Bystrom, Mrs. (Karolina) 1.000000 female 42.000000 2.000000 0.000000 13.000000 nan nan
3 Doling, Miss. Elsie 1.000000 female 18.000000 2.000000 1.000000 23.000000 nan nan
4 Doling, Mrs. John T (Ada Julia Bone) 1.000000 female 34.000000 2.000000 1.000000 23.000000 nan nan
5 Ilett, Miss. Bertha 1.000000 female 17.000000 2.000000 0.000000 10.500000 nan nan
6 Louch, Mrs. Charles Alexander (Alice Adelaide Slow) 1.000000 female 42.000000 2.000000 1.000000 26.000000 nan nan
7 Nasser, Mrs. Nicholas (Adele Achem) 1.000000 female 14.000000 2.000000 1.000000 30.070800 nan nan
8 Renouf, Mrs. Peter Henry (Lillian Jefferys) 1.000000 female 30.000000 2.000000 1.000000 21.000000 nan nan
9 Trout, Mrs. William H (Jessie L) 1.000000 female 28.000000 2.000000 0.000000 12.650000 nan nan
10 Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson) 1.000000 female 33.000000 3.000000 1.000000 15.850000 nan nan
11 Drapkin, Miss. Jennie 1.000000 female 23.000000 3.000000 0.000000 8.050000 nan nan
12 Heikkinen, Miss. Laina 1.000000 female 26.000000 3.000000 0.000000 7.925000 nan nan
13 Honkanen, Miss. Eliina 1.000000 female 27.000000 3.000000 0.000000 7.925000 nan nan
14 Kennedy, Mr. John 1.000000 male 28.000000 3.000000 0.000000 7.750000 nan nan
15 McCormack, Mr. Thomas Joseph 1.000000 male 28.000000 3.000000 0.000000 7.750000 nan nan
16 McGowan, Miss. Anna "Annie" 1.000000 female 15.000000 3.000000 0.000000 8.029200 nan nan
17 Moussa, Mrs. (Mantoura Boulos) 1.000000 female 28.000000 3.000000 0.000000 7.229200 nan nan
18 O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) 1.000000 female 28.000000 3.000000 1.000000 15.500000 nan nan
19 O'Dwyer, Miss. Ellen "Nellie" 1.000000 female 28.000000 3.000000 0.000000 7.879200 nan nan
20 Osman, Mrs. Mara 1.000000 female 31.000000 3.000000 0.000000 8.683300 nan nan
21 Shine, Miss. Ellen Natalia 1.000000 female 28.000000 3.000000 0.000000 7.779200 nan nan
22 Wilkes, Mrs. James (Ellen Needs) 1.000000 female 47.000000 3.000000 1.000000 7.000000 nan nan
23 Yasbeck, Mrs. Antoni (Selini Alexander) 1.000000 female 15.000000 3.000000 1.000000 14.454200 nan nan

Summary of Outlier Analysis¶

  1. There were a total of 22 small children under the age of 2 on the Titanic. Five of them did not survive. They traveled in 3rd class. Being in a privileged group was not enough to survive in 3rd class. The 3rd class had difficulty accessing lifeboats because a large part of the ship was inaccessible to them and was barred by gates.
  2. The most expensive ticket on the Titanic cost 512 GBP. This was 36 times the median price of all tickets. The chart above illustrates the vast gap between the wealthiest and the average passengers. It was an astronomical sum for those times. For comparison, a house in England could be bought for 250 GBP. The price of the most expensive ticket included a "Parlor Suite" apartment with two bedrooms and a private patio. According to our list, only four people could afford such luxury. Additionally, we know that the rest of the 'Parlor Suite' apartments were not for sale but were reserved for the line's owners and VIP guests for promotional purposes. As it turned out, room service, electric blankets and pillows, and access to the gym were not as significant an advantage as the fact that these apartments were on the same deck as the lifeboats.
  3. According to the dataset, 17 people traveled on the Titanic for free. This is not a data error. Some passengers indeed did not pay for their tickets. In the first class, this included the owner of the White Star Lines - Mr. Joseph Bruce Ismay, and his colleagues and close friends (they occupied the remaining apartments described in the previous point). In the 2nd and 3rd classes, these were mainly contractors and employees of the line who were not part of the permanent crew - including members of the famous orchestra that played until the end.
  4. Lifeboat number one carried only five passengers. This is not a data error. First-class passengers, the Duff Gordons, and three of their friends escaped the Titanic by organizing a private lifeboat. Sir Cosmo Duff Gordon was even charged with bribing the crew and refusing to help others but was later acquitted of the charges.
  5. There are nine people who have a lifeboat number but did not survive. This is not a data error. Sources tell us that the boat marked A partially took on water, and passengers sat in it knee-deep in icy water. As for the remaining five people, we are not sure. It can be assumed that hypothermia played a significant role.
  6. There are 23 people who are not assigned to any boat but survived nonetheless. However, I suspect a data error here. Most of these people are young women from the 2nd and 3rd classes. I have not found any information that could confirm or deny this, but I suspect an error - such as not providing a boat number.

Source: When asked about the source, ChatGPT cites books: "A Night to Remember" by Walter Lord, "The Loss of the S.S. Titanic" by Lawrence Beesley, "Titanic: A Voyage of Discovery" by Trev Rowe. It's interesting whether it actually has these books in its datasets or just claims so to 'look better.'

TitanicFirstClassDiningRoom-1.jpg

Observations and Final Conclusions¶

No description has been provided for this image

Finally, I added two charts that clearly illustrate the survival correlations. Yellow dots represent rescued individuals. Large dots indicate first class, medium dots indicate second class, and the smallest dots indicate third class.

What do the numbers tell us?¶

If we were to identify a single variable that had the greatest impact on survival, it was undoubtedly gender. The crew followed the principle of women and children first. This rule was imposed by both maritime law and custom. This is clearly reflected in the data. The second most important factor was class and ticket price - wealthier passengers were more privileged. The third factor was age, meaning that the youngest passengers had the highest chances of survival. The fourth factor was whether someone traveled with family or alone. However, this variable played a lesser role for the poorest passengers.

The data on class and ticket price perfectly reflect the social stratification of that time. The majority of victims were men traveling in third class. In this class, fatalities were even seen among the youngest passengers. The poorest passengers were isolated from the upper decks of the massive ship. They were also isolated from information. A large portion of passengers simply did not know they were in serious trouble. Even if there was space for them in the lifeboats, they either couldn't reach them or didn't know they should.

Could more people have been saved? The data suggests yes. There was still plenty of room in the lifeboats. However, this would not have significantly changed the scale of the tragedy. There were simply too few lifeboats. I believe we should not judge the decisions made during the evacuation. None of us knows how we would have acted. We should judge the decisions made "calmly" on an ordinary day in the comfort of one's office. These very decisions led to so many deaths. Evidence of this is the fact that after the disaster, a number of legal regulations regarding the number of lifeboats and navigation and speed in difficult sea conditions were changed.

The biggest problem with the Titanic was not the lack of crew training, the chaotic and inept evacuation, the lack of lifeboats, or excessive speed. The biggest problem was that everyone - absolutely everyone - from the poorest to the wealthiest succumbed to the illusion that the Titanic was unsinkable. This illusion led to a series of poor decisions before the disaster and to panic and chaos during the disaster. The belief in one's own infallibility, arrogance, and blind faith in technology - these were the main causes of this epic tragedy. I wonder if things are different today.

the-luxury-dining-hall-of-the-rms-titanic-news-photo-1643817406.jpg

[NbConvertApp] Converting notebook titanic_en.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 5 image(s).
[NbConvertApp] Writing 8043301 bytes to titanic_en_as_report.html